AITopics

2509.15152

Country:

Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.35)

arXiv.org Artificial IntelligenceJan-17-2025

HOPS: High-order Polynomials with Self-supervised Dimension Reduction for Load Forecasting

Song, Pengyang, Feng, Han, Shukla, Shreyashi, Wang, Jue, Hong, Tao

Load forecasting is a fundamental task in smart grid. Many techniques have been applied to developing load forecasting models. Due to the challenges such as the Curse of Dimensionality, overfitting, and limited computing resources, multivariate higher-order polynomial models have received limited attention in load forecasting, despite their desirable mathematical foundations and optimization properties. In this paper, we propose low rank approximation and self-supervised dimension reduction to address the aforementioned issues. To further improve computational efficiency, we also introduce a fast Conjugate Gradient based algorithm for the proposed polynomial models. Based on the ISO New England dataset used in Global Energy Forecasting Competition 2017, the proposed method high-order polynomials with self-supervised dimension reduction (HOPS) demonstrates higher forecasting accuracy over several competitive models. Additionally, experimental results indicate that our approach alleviates redundant variable construction, achieving better forecasts with fewer input variables.

artificial intelligence, forecasting, machine learning, (18 more...)

2501.10637

Country:

North America > United States > Massachusetts (0.04)
Asia > China > Beijing > Beijing (0.04)
North America > United States > Vermont (0.04)
(6 more...)

Genre: Research Report (0.64)

Industry: Energy > Power Industry (0.34)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (1.00)

Jesson, Andrew, Beltran-Velez, Nicolas, Blei, David

Can Generative AI Solve Your In-Context Learning Problem? A Martingale Perspective

arXiv.org Machine LearningDec-8-2024

This work is about estimating when a conditional generative model (CGM) can solve an in-context learning (ICL) problem. An in-context learning (ICL) problem comprises a CGM, a dataset, and a prediction task. The CGM could be a multimodal foundation model; the dataset, a collection of patient histories, test results, and recorded diagnoses; and the prediction task to communicate a diagnosis to a new patient. A Bayesian interpretation of ICL assumes that the CGM computes a posterior predictive distribution over an unknown Bayesian model defining a joint distribution over latent explanations and observable data. From this perspective, Bayesian model criticism is a reasonable approach to assess the suitability of a given CGM for an ICL problem. However, such approaches--like posterior predictive checks (PPCs)--often assume that we can sample from the likelihood and posterior defined by the Bayesian model, which are not explicitly given for contemporary CGMs. To address this, we show when ancestral sampling from the predictive distribution of a CGM is equivalent to sampling datasets from the posterior predictive of the assumed Bayesian model. Then we develop the generative predictive p-value, which enables PPCs and their cousins for contemporary CGMs. The generative predictive p-value can then be used in a statistical decision procedure to determine when the model is appropriate for an ICL problem. Our method only requires generating queries and responses from a CGM and evaluating its response log probability. We empirically evaluate our method on synthetic tabular, imaging, and natural language ICL tasks using large language models. An in-context learning (ICL) problem comprises a conditional generative model (CGM), a dataset, and a prediction task (Brown et al., 2020; Dong et al., 2022).

discrepancy, machine learning, natural language, (21 more...)

2412.06033

Country:

North America > United States (0.14)
North America > Canada > Ontario > Toronto (0.14)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report > Experimental Study (0.60)

Industry:

Health & Medicine (0.46)
Education > Focused Education > Special Education (0.41)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.40)

Demir, Samet, Dogan, Zafer

Random Features Outperform Linear Models: Effect of Strong Input-Label Correlation in Spiked Covariance Data

arXiv.org Machine LearningSep-30-2024

Random Feature Model (RFM) with a nonlinear activation function is instrumental in understanding training and generalization performance in high-dimensional learning. While existing research has established an asymptotic equivalence in performance between the RFM and noisy linear models under isotropic data assumptions, empirical observations indicate that the RFM frequently surpasses linear models in practical applications. To address this gap, we ask, "When and how does the RFM outperform linear models?" In practice, inputs often have additional structures that significantly influence learning. Therefore, we explore the RFM under anisotropic input data characterized by spiked covariance in the proportional asymptotic limit, where dimensions diverge jointly while maintaining finite ratios. Our analysis reveals that a high correlation between inputs and labels is a critical factor enabling the RFM to outperform linear models. Moreover, we show that the RFM performs equivalent to noisy polynomial models, where the polynomial degree depends on the strength of the correlation between inputs and labels. Our numerical simulations validate these theoretical insights, confirming the performance-wise superiority of RFM in scenarios characterized by strong input-label correlation.

activation function, generalization error, linear model, (14 more...)

2409.2025

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

arXiv.org Artificial IntelligenceJun-29-2024

Stochastic stem bucking using mixture density neural networks

Schmiedel, Simon

Poor bucking decisions made by forest harvesters can have a negative effect on the products that are generated from the logs. Making the right bucking decisions is not an easy task because harvesters must rely on predictions of the stem profile for the part of the stems that is not yet measured. The goal of this project is to improve the bucking decisions made by forest harvesters with a stochastic bucking method. We developed a Long Short-Term Memory (LSTM) neural network that predicted the parameters of a Gaussian distribution conditioned on the known part of the stem, enabling the creation of multiple samples of stem profile predictions for the unknown part of the stem. The bucking decisions could then be optimized using a novel stochastic bucking algorithm which used all the stem profiles generated to choose the logs to generate from the stem. The stochastic bucking algorithm was compared to two benchmark models: A polynomial model that could not condition its predictions on more than one diameter measurement, and a deterministic LSTM neural network. All models were evaluated on stem profiles of four coniferous species prevalent in eastern Canada. In general, the best bucking decisions were taken by the stochastic LSTM models, demonstrating the usefulness of the method. The second-best results were mostly obtained by the deterministic LSTM model and the worst results by the polynomial model, corroborating the usefulness of conditioning the stem curve predictions on multiple measurements.

polynomial model, prediction, stem profile, (15 more...)

2407.0051

Country:

North America > Canada > Quebec (0.14)
South America > Brazil (0.04)
Europe > Spain (0.04)
(9 more...)

Genre: Research Report > New Finding (0.69)

Industry: Materials > Paper & Forest Products (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningMar-23-2024

Identifiable Latent Neural Causal Models

Liu, Yuhang, Zhang, Zhen, Gong, Dong, Gong, Mingming, Huang, Biwei, Hengel, Anton van den, Zhang, Kun, Shi, Javen Qinfeng

Causal representation learning seeks to uncover latent, high-level causal representations from low-level observed data. It is particularly good at predictions under unseen distribution shifts, because these shifts can generally be interpreted as consequences of interventions. Hence leveraging {seen} distribution shifts becomes a natural strategy to help identifying causal representations, which in turn benefits predictions where distributions are previously {unseen}. Determining the types (or conditions) of such distribution shifts that do contribute to the identifiability of causal representations is critical. This work establishes a {sufficient} and {necessary} condition characterizing the types of distribution shifts for identifiability in the context of latent additive noise models. Furthermore, we present partial identifiability results when only a portion of distribution shifts meets the condition. In addition, we extend our findings to latent post-nonlinear causal models. We translate our findings into a practical algorithm, allowing for the acquisition of reliable latent causal representations. Our algorithm, guided by our underlying theory, has demonstrated outstanding performance across a diverse range of synthetic and real-world datasets. The empirical observations align closely with the theoretical findings, affirming the robustness and effectiveness of our approach.

distribution shift, identifiability, latent causal, (15 more...)

2403.15711

Country:

Asia > Japan > Honshū > Tōhoku > Iwate Prefecture > Morioka (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
Oceania > Australia > South Australia > Adelaide (0.04)
(5 more...)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Mohamed, Hosameldin Awadalla Omer, Nava, Gabriele, Vanteddu, Punith Reddy, Braghin, Francesco, Pucci, Daniele

Nonlinear In-situ Calibration of Strain-Gauge Force/Torque Sensors for Humanoid Robots

arXiv.org Artificial IntelligenceDec-15-2023

High force/torque (F/T) sensor calibration accuracy is crucial to achieving successful force estimation/control tasks with humanoid robots. State-of-the-art affine calibration models do not always approximate correctly the physical phenomenon of the sensor/transducer, resulting in inaccurate F/T measurements for specific applications such as thrust estimation of a jet-powered humanoid robot. This paper proposes and validates nonlinear polynomial models for F/T calibration, increasing the number of model coefficients to minimize the estimation residuals. The analysis of several models, based on the data collected from experiments with the iCub3 robot, shows a significant improvement in minimizing the force/torque estimation error when using higher-degree polynomials. In particular, when using a 4th-degree polynomial model, the Root Mean Square error (RMSE) decreased to 2.28N from the 4.58N obtained with an affine model, and the absolute error in the forces remained under 6N while it was reaching up to 16N with the affine model.

robot, sensor, trajectory, (16 more...)

2312.09846

Country:

Europe > United Kingdom (0.04)
Europe > Italy > Lombardy > Milan (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.83)

arXiv.org Artificial IntelligenceDec-8-2023

Disentangling CO Chemistry in a Protoplanetary Disk Using Explanatory Machine Learning Techniques

Diop, Amina, Cleeves, Ilse, Anderson, Dana, Pegues, Jamila, Plunkett, Adele

Molecular abundances in protoplanetary disks are highly sensitive to the local physical conditions, including gas temperature, gas density, radiation field, and dust properties. Often multiple factors are intertwined, impacting the abundances of both simple and complex species. We present a new approach to understanding these chemical and physical interdependencies using machine learning. Specifically we explore the case of CO modeled under the conditions of a generic disk and build an explanatory regression model to study the dependence of CO spatial density on the gas density, gas temperature, cosmic ray ionization rate, X-ray ionization rate, and UV flux. Our findings indicate that combinations of parameters play a surprisingly powerful role in regulating CO compared to any singular physical parameter. Moreover, in general, we find the conditions in the disk are destructive toward CO. CO depletion is further enhanced in an increased cosmic ray environment and in disks with higher initial C/O ratios. These dependencies uncovered by our new approach are consistent with previous studies, which are more modeling intensive and computationally expensive. Our work thus shows that machine learning can be a powerful tool not only for creating efficient predictive models, but also for enabling a deeper understanding of complex chemical processes.

abundance, chemistry, predictor, (16 more...)

2312.05254

Country:

North America > United States > Virginia > Albemarle County > Charlottesville (0.14)
North America > United States > Ohio (0.04)
North America > United States > Maryland > Baltimore (0.04)
North America > United States > District of Columbia > Washington (0.04)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.35)

arXiv.org Artificial IntelligenceNov-14-2023

Converting Transformers to Polynomial Form for Secure Inference Over Homomorphic Encryption

Zimerman, Itamar, Baruch, Moran, Drucker, Nir, Ezov, Gilad, Soceanu, Omri, Wolf, Lior

Designing privacy-preserving deep learning models is a major challenge within the deep learning community. Homomorphic Encryption (HE) has emerged as one of the most promising approaches in this realm, enabling the decoupling of knowledge between the model owner and the data owner. Despite extensive research and application of this technology, primarily in convolutional neural networks, incorporating HE into transformer models has been challenging because of the difficulties in converting these models into a polynomial form. We break new ground by introducing the first polynomial transformer, providing the first demonstration of secure inference over HE with transformers. This includes a transformer architecture tailored for HE, alongside a novel method for converting operators to their polynomial equivalent. This innovation enables us to perform secure inference on LMs with WikiText-103. It also allows us to perform image classification with CIFAR-100 and Tiny-ImageNet. Our models yield results comparable to traditional methods, bridging the performance gap with transformers of similar scale and underscoring the viability of HE for state-of-the-art applications. Finally, we assess the stability of our models and conduct a series of ablations to quantify the contribution of each model component.

approximation, polynomial, transformer, (15 more...)

2311.0861

Country:

Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Ukraine > Kharkiv Oblast > Kharkiv (0.04)

Genre: Research Report > Promising Solution (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceOct-24-2023

Identifiable Latent Polynomial Causal Models Through the Lens of Change

Liu, Yuhang, Zhang, Zhen, Gong, Dong, Gong, Mingming, Huang, Biwei, Hengel, Anton van den, Zhang, Kun, Shi, Javen Qinfeng

Causal representation learning aims to unveil latent high-level causal representations from observed low-level data. One of its primary tasks is to provide reliable assurance of identifying these latent causal models, known as identifiability. A recent breakthrough explores identifiability by leveraging the change of causal influences among latent causal variables across multiple environments (Liu et al., 2022). However, this progress rests on the assumption that the causal relationships among latent causal variables adhere strictly to linear Gaussian models. In this paper, we extend the scope of latent causal models to involve nonlinear causal relationships, represented by polynomial models, and general noise distributions conforming to the exponential family. Additionally, we investigate the necessity of imposing changes on all causal parameters and present partial identifiability results when part of them remains unchanged. Further, we propose a novel empirical estimation method, grounded in our theoretical finding, that enables learning consistent latent causal representations. Our experimental results, obtained from both synthetic and real-world data, validate our theoretical contributions concerning identifiability and consistency. Causal representation learning, aiming to discover high-level latent causal variables and causal structures among them from unstructured observed data, provides a prospective way to compensate for drawbacks in traditional machine learning paradigms, e.g., the most fundamental limitations that data, driving and promoting the machine learning methods, needs to be independent and identically distributed (i.i.d.) (Sch olkopf, 2015). From the perspective of causal representations, the changes in data distribution, arising from various real-world data collection pipelines (Karahan et al., 2016; Frumkin, 2016; Pearl et al., 2016; Chandrasekaran et al., 2021), can be attributed to the changes of causal influences among causal variables (Sch olkopf et al., 2021). These changes are observable across a multitude of fields. For instance, these could appear in the analysis of imaging data of cells, where the contexts involve batches of cells exposed to various small-molecule compounds.

causal influence, latent causal, latent causal variable, (14 more...)

2310.1558

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
Oceania > Australia > South Australia > Adelaide (0.04)
Oceania > Australia > New South Wales (0.04)
North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.81)